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ABSTRACT 



This project at the Texas College of Osteopathic Medicine 
(Fort Worth) evaluated the use of an artificial - intelligence-derived measure, 
"Knowledge -Based Inference Tool" (KBIT), as the basis for assessing medical 
students’ diagnostic capabilities and designing instruction to improve 
diagnostic skills. The instrument was designed to address the problem that, 
in medicine, diagnostic expertise is problem-specific and appears to be more 
a factor of the student's knowledge base than cognitive skills. This study 
determined that the KBIT produced reliable and valid (based on comparisons of 
diagnostic accuracy of experts with those of novices) for four different 
problem areas: Weakness, Red Eye, Papulosquamous Disorders, and Elevated 
Creatinine. Additionally the study showed that two expert/KBIT-derived 
instructional approaches significantly improved the diagnostic accuracy of 
treatment student groups when compared to a control group and to students 
conventionally trained. After the executive summary and a project overview, 
this report describes the project's background and origins, its components 
and activities, and results. Attached is a related article titled "An Expert 
Program Shell Designed for Extracting Disease Prototypes ' and Their Use as 
Models for Exploring the 'Strong Problem-Solving Methods' Employed in 
Clinical Reasoning" (F.J. Papa; S. Meyer). (DB) 
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SUMMARY 



In medicine, diagnostic expertise is problem-specific. Furthermore, diagnostic 
expertise appears to be 'knowledge base' and not 'cognitive skills' dependent. 
Unfortunately, conceptual and logistical problems associated with current 
medical assessment methodologies make it difficult to obtain reliable and 
valid, problem-specific/ knowledge base dependent measures of diagnostic 
capabilities. 

In the early 1980's, one author (FJP) demonstrated that an artificial intelligence- 
derived tool could be used to acquire a problem-specific knowledge base from 
medically trained individuals. This tool made it possible to draw reliable, valid, 
and logistically feasible inferences about diagnostic capabilities in a given 
problem area. With funding from FIPSE, we set out to determine: 1) if the AI 
tool (called KBIT - Knowledge Based Inference Tool) could provided reliable 
and valid measures of diagnostic capabilities across a number of problem areas 
(i.e., is KBIT generalizable?), and 2) if KBIT derived instruction could result in 
improved diagnostic capabilities. 

We have recently demonstrated KBIT'S generalizability by producing reliable 
and valid (diagnostic accuracy of experts > novices) measures of diagnostic 
performance in each of four distinctly different problem areas (Weakness, Red 
Eye, Papulosquamous Disorders and Elevated Creatinine). KBIT'S ability to 
produce psychometrically sound problem-specific, knowledge-based 
assessments of diagnostic capabilities made it possible to isolate and identify the 
knowledge base elements which characterize 'expertise'. 

We subsequently demonstrated that two expert/ KBIT derived instructional 
approaches significantly improved the diagnostic accuracy of treatment student 
groups when compared to a control group and a group of students trained with 
conventional instructional approaches. We believe that KBIT can serve as the 
foundation for the development of a new generation of psychometrically 
sound, 'intelligent' assessment and instructional tools. 
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EXECUTIVE SUMMARY 



Project Overview: In medicine, diagnostic expertise is problem-specific. 
Furthermore, diagnostic expertise appears to be 'knowledge base' and not 
'cognitive skills' dependent. Unfortunately, conceptual and logistical problems 
associated with current medical assessment methodologies make it difficult to 
obtain reliable and valid, problem-specific/knowledge base dependent measures 
of diagnostic capabilities. 

In the early 1980' s, one author (FJP) demonstrated that an artificial intelligence- 
derived tool could be used to acquire a problem-specific knowledge base from 
medically trained individuals. This tool made it possible to draw reliable, valid, 
and logistically feasible inferences about diagnostic capabilities in a given 
problem area. With funding from FIPSE, we set out to determine: 1) if the AI tool 
(called KBIT - Knowledge Based Inference Tool) could provided reliable and 
valid measures of diagnostic capabilities across a number of problem areas (i.e., is 
KBIT generalizable?), and 2) if KBIT derived instruction could result in 
improved diagnostic capabilities. 

We have recently demonstrated KBIT'S generalizability by producing reliable and 
valid (diagnostic accuracy of experts > novices) measures of diagnostic 
performance in each of four distinctly different problem areas (Weakness, Red 
Eye, Papulosquamous Disorders and Elevated Creatinine). KBIT'S ability to 
produce psychometrically sound problem-specific, knowledge-based assessments 
of diagnostic capabilities made it possible to isolate and identify the knowledge 
base elements which characterize 'expertise'. 

We subsequently demonstrated that two expert/KBIT derived instructional 
approaches significantly improved the diagnostic accuracy of treatment student 
groups when compared to a control group and a group of students trained with 
conventional instructional approaches. We believe that KBIT can serve as the 
foundation for the development of a new generation of psychometrically sound, 
'intelligent' assessment and instructional tools. 

Purpose: For approximately the past forty years, 8-12% of all patients at autopsy 
are found to have died a premature death from missed diagnosis. Further, major 
missed illnesses with equivocal impact upon survival are present in another 
20% of all autopsies. These findings reinforce the notion that the diagnostic 
process is an extremely difficult cognitive task. 

Clearly, this less than optimal level of diagnostic performance must be derived 
in part, from deficiencies in the assessment methodologies and instructional 
approaches utilized during medical training. The purpose of this investigation 
was to determine if artificial intelligence-derived tools could improve diagnostic 
capabilities-related assessment methodologies and instructional interventions. 



Backgrounds and Origins: In the early 1980's, one author (FJP) was intrigued 
with the notion that artificial intelligence (Al)-derived decision making tools 
could achieve levels of performance equal to experts in well defined problem 
areas across a variety of professions. Common to all of these AI tools was the fact 
that their performance depended almost exclusively upon the knowledge base 
with which it operated. Put simply, if the AI tool's knowledge base was acquired 
from an expert, then it's performance would be superior to the same AI tool 
operating with a knowledge base acquired from a less knowledgeable individual. 

This author became interested in the notion that AI tools might serve as the 
basis of a new generation of assessment instruments in medical education. The 
advantages of Al-derived assessment instruments are as follows. One, 
performance assessments could be problem-specific and knowledge-based (given 
that expertise was problem-specific and knowledge-base dependent, it made sense 
to develop testing methodologies congruent with the nature of expertise). Two, 
once the subject's knowledge base for diagnosing a given problem were in the AI 
tool, their knowledge base could be challenged by hundreds to thousand of 
problem-specific test cases. This could solve the logistical problems which 
adversely affected the reliability of conventional methodologies (i.e., 
methodologies wherein only one to two test cases could be used rather than the 
tens of test cases needed to produce reliable problem-specific performance 
measures). 

Three, if in fact the Al-based performance measures of experts were superior to 
novice performance measures, then this element of construct validity could 
legitimize the further use of AI tools as a means of exploring the knowledge base 
characteristics which distinguished experts and novices. Four, if the knowledge 
base elements which contributed to the experts superior diagnostic performance 
could be isolated via these tools, then these same critical knowledge base 
elements could be fashioned into instructional units designed to explicitly 
impart expertise. Explicitly structured, expert-derived instructional units could 
make it possible to improve the efficiency and effectiveness of the medical 
educational process and in-turn the novice's diagnostic accuracy. 

Project Description: Originally, the authors had committed themselves to 

exploring the generalizability of KBIT as a assessment instrument via 
investigations involving six distinct medical problem domains. During the first 
two years of this project, the investigators were able to successfully demonstrate 
KBIT'S generalizability (in terms of reliability and validity) over the first four 
problem domains (Weakness, Red Eye, Papulosquamous Disorders and Elevated 
Creatinine). These investigations involved over one hundred board certified 
experts in neurology, ophthalmology, dermatology and nephrology and over two 
hundred junior and senior medical students. 
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Given this level of success, the authors deferred activities related to the 
remaining two problem domains (both are now currently underway) and 
focused their primary efforts at identifying the knowledge base characteristics 
which distinguished experts from novices. Investigations into the four validated 
problem domains revealed that the experts knowledge bases not only achieved 
higher levels of diagnostic accuracy but that they also achieved higher pattern 
recognition measures (i.e., the pattern matching and pattern discrimination 
levels of experts > novices). 

These pattern recognition measures reflected the decision making paradigm 
upon which the Al-derived assessment tool was based. That is, that diagnostic 
performance was a pattern recognition phenomena. The authors further 
hypothesized that this pattern recognition phenomena involved the dual 
processes of matching and discriminating a patient's constellation of signs and 
symptoms with internalized 'prototypical' disease patterns. The paradigm further 
purported that these disease prototypes were derived from the subject's 
knowledge base (i.e., a prototype was an abstracted, highly structured knowledge 
base (or disease template) consisting of ranked and weighted, disease-specific 
signs and symptoms). 

The authors subsequently developed a means of extracting disease prototypes 
from expert knowledge bases. We developed various ways of extracting and 
describing expert-derived disease-specific knowledge bases and prototypes. We 
subsequently hypothesized that these expert-derived disease-specific descriptions 
could enable students to achieve higher levels of diagnostic accuracy than control 
(untrained) students and students trained via conventional medical educational 
approaches. The results of our pilot Al-derived instructional approaches 
demonstrated that explicitly structured problem and disease-specific knowledge 
bases, when imparted to novice medical students resulted in statistically superior 
levels of diagnostic accuracy than control or conventionally trained students. 

Project Results: (See Table 1 & 2 on next page) 

Summary and Conclusions: Moderate to highly reliable and valid, problem- 
specific assessments of diagnostic accuracy are logistically possible. KBIT-derived, 
explicitly structured problem and disease-specific knowledge base elements and 
prototypes (Table 2, groups 4 and 5), when imparted to novice medical students 
produce statistically superior levels of diagnostic accuracy than control (Table 2, 
group 1) or conventionally trained students (Table 2, group 2). 

The availability of psychometrically sound, problem-specific measures of 
diagnostic capabilities and knowledge base acquisition techniques now makes it 
feasible to use KBIT as the foundation of a new generation of educationally 
sound, 'intelligent' assessment and instructional tools. 
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Project Results: 



Table 1. Results of KBIT-based assessment instrument capabilities 
(generalizability) in terms of reliability and construct validity (student-t test) 
across multiple problem areas are as follows. 



Weakness 

Red-Eye 

Papulosquamous 

Disorders 

Elevated 

Creatinine 



Reliability Estimate 
(K-R 21) (students) 

.89 

.95 

.71 

.96 



Student-t (one tailed) 
(Experts > Novices) 

p < .000 
p<.011 

p < .001 

p < .000 



Table 2. Results of KBIT-derived instructional treatments (groups 3, 4 & 5) 
designed to produce diagnostic' performance increases. 



I. ANOVA: F ratio 5.8074 F Probability < .0006 



II. Student-Newman-Keuls Procedure 

Groups 

4 5 3 2 1 

Groups 

4 

5 
3 
2 
1 

* Significantly different groups 

Group 1 Untrained; Group 2 trained with conventional approaches 
Groups 3, 4, & 5 trained with various KBIT-derived approaches. 



BODY OF REPORT 



Project Overview: 

In medicine, diagnostic expertise is problem-specific and knowledge base 
dependent. Unfortunately, conceptual and logistical problems associated with 
current medical assessment methodologies preclude educators from achieving 
reliable and valid measures of problem-specific diagnostic capabilities. 
Furthermore, the autopsy literature clearly points out that instruction in the 
extremely difficult task of diagnosis must improve if physicians are to improve 
upon the persistently high levels of misdiagnosis. 

While the medical education literature cries out for innovative assessment and 
instruction approaches as a means of solving these intractable problems, few if 
any researchers have investigated the potential of AI tools in these arenas. This 
project represents an effort primarily designed to use the latest knowledge and 
tools derived from the cognitive sciences to solve these long standing medical 
assessment and instructional problems. 

For approximately twenty years, investigators in the field of artificial 
intelligence (AI) have been using a variety of computer-based tools to emulate 
and study human decision making. One area of fruitful activities has involved 
the use of an AI tools known as the Expert System (ES). These tools have 
characteristics, which on theoretical grounds, make them ideal as assessment 
instruments for measuring diagnostic capabilities. Some of these characteristics 
are as follows. 

One, an ES is designed to solve problems or cases involving in a single problem 
area (i.e., to identify the most likely cause of a given problem from among a 
number of possible causes). Two, once the knowledge base needed to solve a 
given problem is acquired form a subject, the ES can use it to solve literally 
hundreds to thousands of problem cases. Three, an ES can usually solve 
numerous problem cases in literally seconds to minutes. Four, the criteria used 
to determine if a given problem case was solved correctly or incorrectly can be 
precisely and consistently applied to all problem cases in its case data bank. 

Five, one inherent aspect of an ES is that a knowledge base acquired from an 
expert is likely to make the ES perform in a manner superior to an ES using a 
knowledge base acquired from an individual with intermediate or novice level 
knowledge. Six, the knowledge base of an ES can be investigated so as to 
determine why the knowledge base acquired from the expert performed in a 
manner superior to the knowledge base acquired from a novice. 

From a medical educators perspective, the first five characteristics translate into 
the following assessment advantages. One, given that diagnostic expertise is 



problem-specific and knowledge base (and not cognitive skills) dependent, then 
an ES-based assessment tool would appear to be an ideal means of acquiring, 
controlling for and assessing the diagnostic utility of an individual's knowledge 
base for solving (diagnosing) cases in a given problem area. Two, the logistical 
problems associated with the reliability of an assessment instrument are 
generally associated with the limited number of case challenges which an 
examine can physically pass through in a given time unit. Given that the ES 
contains the subject's knowledge base, it can literally be challenged with 
hundreds to thousands of problem-specific test cases and thereby achieve test 
reliability when supplied with the number of test cases sufficient for a given 
level of test reliability. 

Three, this great number of test case challenges can be solved by an ES in 
essentially no more time than it took to acquire the subject's knowledge base to 
begin with thereby solving the logistical problems (lengthy test taking time) 
associated with traditional testing formats. Four, measurement error 
attributable to case presentation-related variance and examiner-related variance 
can be completely eliminated as the ES applies the same assessment criteria to 
all cases and for all subjects. Five, the long sought for, and infrequently if ever 
attained quest for 'construct validity' is likely to be achieved given the nature of 
the ES (i.e., the performance of expert-derived knowledge bases are likely to be 
superior to novice-derived performances). 

The ability to acquire (with a fair degree of psychometric validity) the 
knowledge base elements which distinguish experts from novices makes it 
possible to extract and explicitly impart the precise knowledge base elements 
which could expedite the novice's transformation from novice to expert. Thus 
this ability, the sixth of the inherent advantages of ES tools, makes it 
conceivable to produce instructional interventions which could increase the 
efficiency and effectiveness with which the student's diagnostic abilities are 
developed. 

In the early 1980's, one author (FJP) demonstrated that an artificial intelligence- 
derived tool could be used to acquire a problem-specific knowledge base from 
medically trained individuals and use it to draw reliable, valid, and logistically 
feasible inferences about their diagnostic capabilities in the problem area of 
Acute Chest Pain. With funding from FIPSE, we set out to determine: 1) if the 
AI tool (called KBIT - Knowledge Based Inference Tool) could provided reliable 
and valid measures of diagnostic capabilities across a number of problem areas 
(i.e., is KBIT generalizable?), and 2) if KBIT derived instruction could result in 
improved diagnostic capabilities. 

We have recently demonstrated KBIT'S generalizability by producing reliable 
and valid (diagnostic accuracy of experts > novices) measures of diagnostic 
performance in each of four distinctly different problem areas (Weakness, Red 
Eye, Papulosquamous Disorders and Elevated Creatinine). These studies 
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involved over one hundred board certified experts and two hundred medical 
students (novices). 

KBIT'S ability to produce psychometrically sound problem-specific, knowledge- 
based assessments of diagnostic capabilities made it possible to isolate and 
identify the knowledge base elements which characterize 'expertise'. We 
subsequently demonstrated that two expert/KBIT derived instructional 
approaches significantly improved the diagnostic accuracy of treatment student 
groups when compared to a control group and a group of students trained with 
conventional instructional approaches. We believe that KBIT can serve as the 
foundation for the development of a new generation of psychometrically 
sound, 'intelligent' assessment and instructional tools. 



Purpose: 

For approximately the past forty years, 8-12% of all patients at autopsy are found 
to have died a premature death from missed diagnosis. Further, major missed 
illnesses with equivocal impact upon survival are present in another 20% of all 
autopsies. These findings reinforce the notion that the diagnostic process is an 
extremely difficult cognitive task. 

Clearly, this less than optimal level of diagnostic performance must be derived 
in part, from deficiencies in the assessment methodologies and instructional 
approaches utilized during medical training. The purpose of this investigation 
was to determine if artificial intelligence-derived tools could improve 
diagnostic capabilities-related assessment methodologies and instructional 
interventions. 



Background and Origins: 

The background regarding the theoretical advantages possible via the use of AI- 
derived assessment and instructional approaches has been briefly discussed. At 
this time we would like to discuss our emphasis upon the need to achieve 
construct validity. 

It is very possible to use traditional assessment instruments and subsequently 
produce highly reliable test results via either Classical Test Theory or 
Generalizability Theory. However, high levels of test reliability do not mean 
that the test does reflect measures of the targeted construct. Therefore, all of our 
efforts have been designed to first attack head-on the issue of construct validity. 

Specifically, we attempted to develop a means of measuring the 'diagnostic 
abilities' (in terms of diagnostic accuracy) of medically trained individuals. 
Therefore the construct under investigation and assessment was that of 



diagnostic performance. Given that the medical education literature had 
determined that diagnostic capabilities were: 1) problem-specific and 2) 
knowledge base (and not cognitive skills) dependent, the investigators looked 
to develop an assessment instrument that involved the acquisition of a 
problem-specific knowledge base from test subjects. 

The problem-specific/knowledge based nature of our testing methodology 
allows us to create a testing environment wherein all subjects are required to 
describe their knowledge base as related to the same pre-defined number of 
common/ important diseases for the given problem area and the common/ 
important signs and symptoms used to diagnose these same diseases. Thus the 
investigators have created a perfectly even playing field wherein all subjects 
must work within the same problem solving context. Therefore, any 
extraneous or hidden advantages which the expert may have or deficiencies of 
the novice are eliminated. 

Subsequently, any differences in the Al-tools diagnostic performance must be 
related to differences in the expert or novice groups knowledge base. The 
boundaries of this knowledge base are explicitly delineated via the use of pre- 
defined problem area test boundaries (diseases and signs/symptoms). By 
demonstrating construct validity (i.e., that the diagnostic performance of 
experts is greater that the diagnostic performance of novices) the investigators 
can say that these differences can only be due to differences in their knowledge 
base as related to the specific problem at hand and as defined by the problem 
space boundaries. 

The investigators subsequently felt less compelled to pursue very high levels of 
test reliability (> .80) in these pilot tests. It is important however, to keep in 
mind that via these AI tools, all that would be needed to achieve the needed 
level of test reliability in any given problem area (given that construct validity 
was demonstrated) is to acquire the number of test cases sufficient to attain a 
given reliability level and simply add them to the test case data bank. In 
reviewing the results of our investigations note that three of the problem areas 
achieved reliability estimates (K-R 21) of .89 to .96. 

Project Description: This project had essentially two separate components. The 
first involved the determination of the generalizability of KBIT as a reliable 
and valid assessment instrument. To determine this we initially set out to 
investigate the reliability and validity parameters derived from studies 
involving six separate problem areas. The process of developing a problem- 
specific assessment instrument required that we first define the boundaries for 
a given problem area. This required in-turn that we identify the 
common/important diseases likely to cause a given problem and the 
common/important signs and symptoms that should be gathered in order to 
determine the cause of the given problem. 



In defining these problem area boundary condition, we generally utilized the 
expertise of two experts in a given specialty. For example in the problem area of 
Red-Eye we meet with two board certified Ophthalmologists over three 
separate occasions with our KBIT tool. During each session we would refine the 
number of diseases and signs /symptoms to be included in the problem area. 
We used KBIT to help the physicians focus in on what represented the essence 
of the essential issues related to solving the given problem. We termed this 
component of the investigation Problem Space Boundaries Definition and 
utilized what we have termed Knowledge Engineering techniques to help the 
expert consultants to gradually refine the boundaries of the problem area. 

Once we felt comfortable with the boundaries for a given problem area, we 
developed a questionnaire which allowed us to acquire the needed knowledge 
base form our targeted groups of students and novices. The knowledge base 
which we needed consisted of each subject's knowledge of the 'relationships' 
between each of the diseases in the problem area and the signs/symptoms 
included. These relationships can be viewed as representing the individual's 
understanding of the percentage of time a patient with a given disease was 
likely to have a given finding. The cognitive sciences literature refers to this 
type of knowledge as feature frequency estimates. The probability literature 
refers to this knowledge as conditional probability estimates. 

We would generally send out our questionnaire to at least 100 board certified 
specialists per given problem area. We anticipated, and generally received a 25- 
35% response rate (with the exception of Elevated Creatinine which produced a 
poor response rate of less than 10%). We were able to obtain feature frequency 
estimates form students while on different clinical rotations via the 
cooperation of the Departments of Internal Medicine and Family Practice. We 
usually obtained questionnaires from 60 to 80 medical students per problem 
area. 

All feature frequency estimates were entered into the KBIT software (the Pi's 
own design (FJP)). Criteria test cases which were used to challenge the 
diagnostic accuracy of each subject were gathered via the cooperation of the 
specialists used to create the problem area's test boundaries. Generally we 
wanted to accumulate approximately 100 cases per problem area. This proved to 
be the most difficult aspect of the investigation as the consultants were not 
enthusiastic to collect test cases data in the specific manner as outlined in the 
problem area. Nonetheless, with much coaxing and persistence, we were 
generally able to acquire the number of test cases sufficient to produce highly 
reliable measures for each of the problem areas. 

Much of the true research in this project involved the various ways in which 
the knowledge bases could be manipulated. The real objective of our deepest 
levels of research involved gaining new insights into how experts structured 
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their knowledge base and how these knowledge base structures supported the 
experts achieving their higher levels of diagnostic accuracy. 



Ultimately we were able to determine that 'disease prototypes', that is abstracted 
representations of ranked and weighted signs and symptoms for a given disease 
appeared to be a parsimonious mechanism for storing and conveying 
'expertise'. These prototypes proved to be the most efficient and effective means 
of conveying the 'hidden' knowledge of the expert to the novice. 

Project Results: 

Table 1. Results of KBIT-based assessment instrument capabilities 
(generalizability) in terms of reliability and construct validity (student-t test) 
across multiple problem areas are as follows. 





Reliability Estimate 


Student-t (one tailed) 




(K-R 21) (students) 


(Experts > Novices) 


Weakness 


.89 


p < .000 


Red-Eye 


.95 


p < .011 


Papulosquamous 


Disorders 


.71 


p < .001 


Elevated 


Creatinine 


.96 


p < .000 



Table 2. Results of KBIT-derived instructional treatments (groups 3, 4 & 5) 
designed to produce diagnostic performance increases. 



I. ANOVA: F ratio 5.8074 F Probability < .0006 

II. Student-Newman-Keuls Procedure 

Groups 

4 5 3 2 1 

Groups 

4 

5 
3 

2 * * 

1 * * 

* Significantly different groups 

Group 1 Untrained; Group 2 trained with conventional approaches 
Groups 3, 4, & 5 trained with various KBIT-derived approaches. 



The results of this project are very encouraging. We have had a great deal of 
success in presenting and publishing our results and have at least as many 
potential papers and presentations yet to deliver. Within the context of our 
own institution the Dean has given us support in the establishment of 
problem-specific assessment instruments in each of the core clinical rotations. 

We have received acknowledgment from the Association of American Medical 
Colleges /Research in Medical Education subgroup (i.e., the awarding of the 
Thomas Hale Hamm "New Investigator" award) and have been increasing 
asked by American and European medical school faculty members to provide 
additional unpublished information regarding the nature and scope of our AI- 
related activities. We believe that during the next five years, a small but 
significant number of educators interested in the use of AI assessment and 
instructional activities and approaches will come foreword. 

Finally, we also believe that the medical education assessment establishment 
will balk at the widespread use of this assessment technology primarily because 
they have no to very little knowledge or understanding of the concepts 
surrounding artificial intelligence techniques. However, perhaps as the first 
decade of the twenty-first century ends there will be a number of medical 
training institutions using these Al-derived tools and techniques in an effort to 
truly prepare physicians for medical practice in the twenty first century. 



Summary and Conclusions: Moderate to highly reliable and valid, problem- 

specific assessments of diagnostic accuracy are logistically possible. KBIT- 
derived, explicitly structured problem and disease-specific knowledge base 
elements and prototypes (see Results Table 2, treatment groups 4 and 5), when 
imparted to novice medical students produce statistically superior levels of 
diagnostic accuracy than control (see Results Table 2, group 1) or 
conventionally trained students (see Results Table 2, group 2). 

Work in the two remaining problem areas is underway. One area involves the 
revisiting of the problem of Acute Chest Pain. In this investigation we intend 
to determine the degree to which this assessment instrument is capable of 
making fine discriminations between subjects. We have already acquired 
knowledge bases from a number of residents in training in the area of 
Emergency Medicine. Our preliminary results suggest that KBIT can in fact 
draw fine levels of discrimination from subject's with varying degrees of 
expertise (i.e., residents in their first few months to three years of residency 
training). 

The second problem area involves the problem of Polyarticular Joint Pain. In 
constructing this problem area we have utilized a more sophisticated approach 
to the construction of the problem space boundaries. That is we are interested 
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in maximizing the amount of discrimination possible in terms of disease- 
specific diagnostic abilities per subject. We hope to begin the data collection for 
this problem area in January, 1993. 

The availability of psychometrically sound, problem-specific measures of 
diagnostic capabilities and knowledge base acquisition techniques now makes it 
feasible to use KBIT as the foundation of a new generation of educationally 
sound, 'intelligent' assessment and instructional tools. We caution 
investigators in this area however, to play increasing attention to the care 
needed to produce efficient and effective problem space boundaries. That is, if 
the problem space definitions do not allow for the amount of discrimination 
needed to support the drawing of distinctions between experts and novices, and 
now more importantly to us, between one disease and another, then the results 
are likely to be disappointing. Much work needs to be done in this area, an area 
which we have called 'test construction knowledge engineering'. 

The authors have recently submitted a FIPSE proposal designed to take the 
additional steps necessary to develop an "intelligent" assessment and 
instructional tool. We hope that this report substantiates the merit in further 
supporting this line of investigation. Clearly, the PI of this project is committed 
to continuing this line of investigation. Evidence of this is derived not only 
from the number of publications and presentations related to this project but 
also by the completion of post graduate training (Ph.D.) in the area of Computer 
Education and Cognitive Systems at the University of North Texas. Further 
evaluations of the KBIT system will continue. 



Appendices: FIPSE assistance was in general very adequate. We were especially 
appreciative of the support of Saundra Newkirk. 

In terms of reviewing future proposals, the investigators suggest that reviewers 
continue to place heavy emphasis upon the ability of AI tools to demonstrate 
elements of construct validity. Clearly these new tools can produce a 
tremendous amount of instructional material. The real question is whether 
any of this new material (or the Al-derived instructional approach) can 
produce efficient and effective changes in performance. 
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Abstract — This paper reviews the progress made towards the development of an Intelligent Computer 
Assisted Instructional tool designed to function in a medical education setting. The tool, called KBIT 
(Knowledge Base Inference Tool) is an expert system-based instrument principally consisting of an 
assessment and a tutorial module. KBITs sole purpose is to support the development and refinement of 
the differential diagnostic (DDX) knowledge and skills of medical students. The objective of the 
assessment module is to provide psychometrically reliable and valid measures of several DDS skills. The 
objective of the tutorial module is to create a learning environment wherein students make refinements 
in knowledge base (KB) constructs which result in progress towards the next level of DDX skills. KBIT’S 
proposed educational approach is comprised of an iterative two-step process consisting of the assessment 
of several DDX skill performance parameters, followed by individualized formative instruction. 



INTRODUCTION 

This paper reviews the progress made towards the development of an Intelligent Computer Assisted 
Instructional (ICAI) tool designed to function in a medical education setting. The ICAI tool, called 
KBIT (Knowledge Base Inference Tool) is an expert system-based instrument principally consisting 
of an assessment and a tutorial module. KBIT’S sole purpose is to support the development and 
refinement of the differential diagnostic (DDX) knowledge and skills of medical students. 

DDX is the keystone intellectual skill of the medical practitioner. The objective of DDX is to 
determine which class of diseases best accounts for the patient’s signs and symptoms. Medical 
practitioners initially use only the data obtained at the patient’s bedside (i.e. historical and physical 
findings, not laboratory data) to reach a “clinical” diagnosis. However, diseases are rarely 
confidently diagnosed with such data. This is because disease states in general lack explicitly defined 
criteria for bedside-based diagnosis, i.e. a list of necessary and sufficient historical and physical signs 
and symptoms. Rather, the practitioner uses soft or fuzzy criteria to formulate a clinical diagnosis 
at the bedside. The practitioner subsequently attempts to confirm the clinical diagnosis with 
laboratory data. In short, the clinical (bedside) component of the diagnostic process represents 
decision making under uncertainty. 



COMPUTATIONAL MODELS OF INFORMATION PROCESSING 
UNDER UNCERTAINTY 

At least three general computational models of information processing under uncertainty have 
evolved [1]: probability, possibility (fuzzy logic or set theory), and certainty theory. The most widely 
utilized models are probabilistic, with Bayes’ as perhaps the best recognized. Consequently, many 
researchers in cognition are not as aware of possibility and certainty theories as potentially useful 
information processing models in inherently uncertain decision-making domains. These alternative 
theories are sometimes referred to as deterministic theories. From the perspective of deterministic 
theories, the likelihood with which an exemplar is a member of a given class has nothing to do 
with the a priori occurrence of the given class in the population (as typified by probabilistic 
theories). Rather, a given exemplar is assigned, or determined to have, a “grade of membership” 
for each of several competing classes without consideration of each class’s a priori occurrence. 
Without elaborating, Cohen [2] and Jungerman[3] have argued that probabilistic theories should 
not be unquestionably accepted as the only valid criterion for measuring the rationality or 
correctness of human decision making under uncertainty. Shortliffe and Buchanan [4] have gone 
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further to suggest that probabilistic models such as Bayes’ are not appropriate methods in 
inherently uncertain decision-making domains such as medicine. A deterministic computational 
model functions as a critical component within KBIT’S DDX paradigm. 



THE TENDENCY TOWARDS A DETERMINISTIC APPROACH 
TO DECISION MAKING 

Deterministic models are theoretically and mathematically viable information-processing 
models. However, evidence that one actually attacks uncertain classification tasks from a 
deterministic rather than probabilistic approach is supported by the work of Kahneman and 
Tversky[5]. Their frequently referenced study suggests that people perform classification tasks 
based upon the extent to which an exemplar is typical of, or a member of, a class. This is contrary 
to mathematically correct or “normative” probabilistic theories. This deterministic approach to 
classification, i.e. classification via recognition of the degree to which an exemplar is similar to the 
typical class representation, is frequently termed the “Representative Heuristic”. 



CLASSIFICATION AND PATTERN RECOGNITION 

The medical cognition literature embraces two primary theories of classification; exemplar and 
prototype theories. These two theories attempt to describe, with detail greater than the represen- 
tative heuristic described above, the type of knowledge used and how knowledge is used to perform 
classification tasks. 

In exemplar theories [6] a clinician performs DDX (disease classification) by recalling the specific 
previously experienced disease exemplar which best matches the presenting case. The diagnosis 
associated with the best matching, previously experienced exemplar, provides the clinician with the 
diagnosis for the presenting case. 

In prototype theories [7, 8] the clinician performs DDX by comparing the presenting case to an 
abstracted representation of each of the possible disease classes likely to account for the case 
presentation. The disease class prototype which best matches the case presentation is the diagnosis 
that will be made by the clinician. 

The types of knowledge (exemplars and prototypes) used in classification is different in the two 
theories. However, it is important to note that both of these classification theories clearly express 
(while the Representative Heuristic implicitly suggests) that classification is accomplished via the 
use of a pattern recognition mechanism. The importance of pattern recognition in KBIT’S DDX 
paradigm will be discussed later. 



ERjt 



ASSESSMENT ISSUES 

The medical education literature contains research sufficient to question the psychometric 
properties (reliability and validity) of DDX assessment instruments. The realization of truly 
efficient and effective DDX-related ICAI tools will not occur unless their developers can first resolve 
these psychometric concerns, for which there are at least three prerequisites. First, there is a need 
to create an explicitly defined and cognitively sound DDX paradigm for modeling a DDX 
assessment instrument. Second, because expertise in general, and DDX skills in particular, are 
problem and disease-specific, medical educators will need to create an assessment format which is 
capable of measuring competency at the problem and disease-specific level. Third, these assessment 
instruments must provide reliable and valid disease- and problem-specific measures for DDX skills. 
We have already described a cognition-based DDX paradigm. Possible solutions to the second and 
third prerequisites are now described. 



The reliability problem 

The reliability problem stems from the following two notions. First, the lack of disease criteria 
for clinical diagnosis speaks to the variability with which a disease class will manifest itself in 
different individuals. Second, there are a number of common arid important diseases that are likely 
to cause a given medical problem. Subsequently, students’ skills for disease and problem-specific 
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DDX can be reliably assessed only by having them solve a number and variety of test cases (perhaps 
six or more) for each of the diseases relevant to the given problem. 

For a medical problem such as “acute chest pain”, for which there are nine common or important 
different causes, it appears that a student would need to be tested with approx. 54 test cases (six 
different cases for each of the nine diseases in the problem area). With conventional DDX 
assessment instruments, a test case takes approx. 5-15 min to work through. With these assessment 
instruments, a prohibitively large amount of time would be required to reliably test each student’s 
DDX skills in this area. 

Utilizing conventional assessment formats, medical educators almost universally utilize only 1 
or 2 test cases per disease class, or worse, per problem area. By lumping a large number of different 
test cases and question formats together, a respectable reliability coefficient of 0.70-0.80 might be 
achieved. However, in reflecting upon the notion that competency is at very least, problem-specific, 
one must ask “What is it that their conventional assessment approaches are measuring?” The 
simple answer is that they are not reliable estimates of competency with problem-specific skills. 

With little elaboration, the promise of KBIT as a reliable, problem- and disease-specific 
instrument for assessment, comes from three sources. First, expert systems are, by definition, 
problem-specific in application. Second, once a knowledge base (KB) has been input into an expert 
system, there is almost no limit to the number and variety of test cases that it could be given to 
solve. Third, the DDX performance levels achieved by the expert system would reflect the 
diagnostic utility and soundness of the KB from whom the KB was extracted. Problem- and 
disease-specific test reliability would be, theoretically, a relatively easy psychometric property to 
achieve. 

The development of an instrument for expert system-based assessment of DDX skills would 
require the creation of an expert system shell capable of extracting a subject’s KB in a time-efficient 
manner. The approach to KB extraction taken by the authors has been described elsewhere [9] but 
will be briefly reviewed later. However, in studies conducted in two separate problem areas (“Acute 
Chest Pain” and neurological “Weakness” [10]) KBIT produced KR-21 reliability coefficients 
>0.89 with only 100 cases per problem area. 

The validity problem 

When experts outperform novices in a test of DDX skills then the test is said to have “construct 
validity”. Perhaps the most critical psychometric concern confronting medical educators has been 
that experts do not necessarily perform better than novices with conventional DDX testing 
instruments. 

An inherent capability of an expert systems-based assessment instrument is the potential to 
achieve construct validity. Put simply, a knowledge base extracted from an expert should 
outperform the knowledge base of a novice. KBIT has provided valid assessments at the 
disease-specific level [11]. KBIT has also provided valid assessments at the problem-specific level 
in two distinct problems areas (“Acute Chest Pain” and neurological “Weakness” [10]). 



KNOWLEDGE BASE EXTRACTION AND DDX SKILLS ASSESSMENT 

The process to extract a knowledge base in KBIT utilizes a single, predefined, “bounded” 
problem-space matrix. The matrix columns represent a list of jx common or important diseases 
known to cause the problem, while the rows represent a list of y common signs/symptoms 
associated with each of the diseases in the problem space. The KB extraction routine requires each 
subject to fill in the empty cells of the matrix. That is, the student’s task is to declare their 
understanding of the percentage of patients with a given disease who exhibit a given finding. These 
feature frequency estimates define their knowledge of the relationship between each disease and 
sign/symptom (see Fig. 1). 

Via a series of manipulations, KBIT transforms these relationships into a highly structured 
representation of the subject’s KB, which contains four interrelated, yet distinct, cognitive 
constructs. The first construct is a one-to-one representation of the subject’s simple declarative KB, 
i.e. the original feature frequency estimates. The second construct is a more complex declarative 
KB construct termed a disease prototype (one prototype is created for each disease in the problem 
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Fig. 1. Subject’s estimates of feature frequencies. 



space). The third type of construct represents a form of procedural knowledge referred to as 
weighting rules. The declarative (prototype) and procedural (weighting rules) knowledge constructs 
are integrated into a fourth construct called a problem-specific DDX schema. 

The purpose of these tranformations is to enable KBIT to use weighting rules and a fuzzy set 
theory-like inferencing mechanism based on pattern recognition to diagnose a collection of test 
cases. Diagnosis is conducted by having KBIT determine the degree to which each test case 
resembles or matches each of its internalized disease prototypes. Thus, KBIT’S DDX information- 
processing paradigm emulates prototype-based classification theories. A test case is said to be 
correctly diagnosed when the disease class which has accumulated the greatest weight, i.e. highest 
degree of “prototype match”, is the same disease class actually diagnosed for the test case. Three 
DDX skills measures are made for each subject. These are diagnostic accuracy, pattern matching 
and pattern discrimination. Diagnostic accuracy is defined as the number of test cases correctly 
diagnosed. Pattern matching is defined as the degree to which each of the subject’s disease 
prototypes correctly matched the findings associated with all test cases representative of the same 
disease. Pattern discrimination is defined as the distance between a correctly diagnosed test case 
and the next most highly weighted disease class, i.e second leading hypothesis. Diagnostic accuracy, 
pattern-matching and pattern-discrimination values can be produced for disease-specific and 
overall problem areas. 



CORRELATIONS BETWEEN SKILLS AND CONSTRUCTS 

KBIT’S assessment parameters represent measures of three different levels of DDX skills. 
Diagnostic accuracy represents a coarse DDX skills measure (both for a disease and for the general 
problem level) while pattern matching and pattern discrimination represent two finer, yet distinct, 
DDX skills measures. The authors have attempted to determine the degree to which refinement 
at one level of DDX skills might impact another DDX skill. Preliminary investigations suggest that 
diagnostic accuracy is more dependent upon pattern discrimination skills then pattern matching 
skills [8]. However, because of KBIT’S design, each of the three DDX skills parameters represent 
estimates of the utility of each subject’s four cognitive constructs. Given these inter-dependencies 
between skills and constructs, the finding that diagnostic accuracy is more dependent upon pattern 
discrimination than pattern matching suggests that it is the distinctiveness between an individual’s 
prototype constructs which best accounts for diagnostic accuracy. This hypothesis represents the 
beginning of efforts to define more precisely the correlations between diagnostic skills and KB 
constructs. 



ADVANTAGES OF AN EXPLICIT COGNITIVE AND INTEGRATED PARADIGM 
FOR ASSESSMENT AND INSTRUCTION IN DDX 

There are several potential advantages to an assessment instrument based on pattern recognition. 
First, there appears to be the capability to provide reliable and valid measures of three different 
problem- and disease-specific DDX skills. Second, there is the potential to correlate these three 
DDX skills performance measures with each of the four distinct yet interrelated cognitive 
constructs which, within the KBIT DDX paradigm, are responsible for the DDX skills performance 
levels achieved. Third, there is the possibility of predicting (in background) how modifications not 
just at a given construct level (e.g. weighting rules), but more so, at a specific aspect of a particular 
construct (e.g. the weighting rule which relates the feature of “fever” and the disease class called 
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pneumonia), would lead to x% improvement in, for example, the subject’s diagnostic accuracy for 
pneumonia. Fourth, KBIT can use the prototypes and weighting rules derived from an individual 
expert or composite group of experts as the basis for modeling particular constructs or performance 
activities in novices. 

CURRENT AND PROPOSED INSTRUCTIONAL FEATURES 

The immediate challenge is to determine how to integrate KBIT’S current assessment capabilities 
with an instructional module which optimizes learning. The approach taken thus far has been to 
base the construction of the instructional module on the work of Burton [12], who used a 
seven-stage strategy for the development of instructional aids. This approach is illustrated in 
Table 1. 

Level 1 (Help — the lowest level), the student is provided with the tools and information necessary 
to navigate through the system via help through built in cues and instructions. The students are 
informed of the information they need to provide, and, how to perform specific tasks. In the KBIT 
program this option is fully implemented. 

Level 2 (Assistance) and level 3 (Empowering tools), KBIT is rather weakly implemented. There 
is no context sensitive help nor is there an historical summary of the student’s performance. 
However, there currently is an tool which allows the student to modify feature frequency estimates, 
transforms them into new weight rules and offers the student an opportunity to view the new levels 
of diagnostic accuracy resulting from these changes. 

Level 4 (Reactive learning), permits the student to propose diagnostic strategies [i.e. determine 
the specific feature(s) to be used, the number of features to be used and their order] and test their 
strategy against the test case data bank. KBIT provides feedback concerning the accuracy of the 
strategy against a specific test case or all test cases in the data bank. Level 4 will support reiterative 
interactions with the subjects via a repetitive process of strategy changes and skills re-assessments. 

Level 5 (Modeling), allows the student to observe an expert perform diagnosis on a given case 
and indicates why the expert selected a particular feature in solving the case. Eventually, as 
additional experts are entered, it is possible that a student could choose a specific expert to watch 
or KBIT could match a student with an expert based on similarities between expert and student 
over a number of cognitive constructs or DDX skill performance levels. 

Level 6 (Coaching), is the process of assisting the student with suggestions as to which learning 
options would provide the most valuable information. This is currently planned as being done in 
two ways. First, as the student faces a learning decision (e.g. which construct changes to make), 
he/she can ask for help from the coach. Second, if the student makes an inappropriate selection 
the coach can interrupt and offer an explanation as to why that choice is not the best and even 
provide the student with a better learning option. Coaching will interact with the subjects at the 
construct levels of prototype and weighting rules modifications. An iterative process of KB 
modification and skills re-assessments is envisioned. 



Tabic I. Burton’s categories of software aids 
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Level 7 (Tutorial), has not been implemented. However, the intention is to provide the student 
with free form access to all prior levels so that the individual style of the student can be taken into 
consideration. 



CONCLUSION 

The authors have made significant progress towards the development of an expert system-based 
ICAI tool whose single purpose is to support the development and refinement of the DDX 
knowledge and skills of medical students. The majority of the work to date has involved: (1) the 
development of an explicitly defined and sound DDX paradigm which serves as the cognitive 
foundation of the ICAI tool, and (2) the development of a psychometrically reliable and valid 
instrument for problem-specific assessment which measures DDX skills levels in a manner 
consistent with an explicitly defined DDX-skills paradigm. The authors are in the early phases of 
modeling the instructional phases of the ICAI tool. 

The most exciting findings involve those which suggest that the assessment tool has provided 
a robust research environment for exploring the correlations between the DDX skills performance 
levels achieved and the constructs responsible for the DDX skills performance levels. These findings 
suggest that ICAI projects have great potential utility not as ends in themselves but also as research 
tools to be used to actively model and test information-processing hypotheses. 
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(FIPSE) and SmithKline Beecham Foundation. 
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Effects of Pattern Matching, Pattern Discrimination, and Experience in the Development of 

Diagnostic Expertise 

FRANK J. PAPA, JAY H. SHORES, and STEVE MEYER 



Pattern recognition, prototypes, and experience play significant 
roles in medical decision making. To study the role that these 
factors play in the development of diagnostic expertise, a 
pattern-recognition -based, prototype -driven model of medical 
decision making was created. The model, an assessment tool 
derived from artificial intelligence (AI), provides valid measures 
of diagnostic accuracy and two prototype-related contributors to 
pattern recognition, this is, pattern matching and pattern 
discrimination. 

In this study an AI assessment tool used disease-by- feature 
frequency estimates from each subject to create disease proto- 
types for each of 9 common causes of acute chest pain. The AI 
tool then used each subject’s 9 prototypes and a pattern- 
recognition -based decision-making mechanism to diagnose 18 
myocardial infarction cases. The data were analyzed to describe 
the role of pattern matching, pattern discrimination, and experi- 
ence in the development of diagnostic expertise for myocardial 
infarction. The following questions are addressed: 

1. Is there a statistically significant relationship between diag- 
nostic accuracy and measures of pattern matching and pat- 
tern discrimination? 

2. Is the effect of pattern matching and pattern discrimination 

on diagnostic accuracy independent of experience? 

Researchers have attempted to determine whether expert/ 
novice diagnostic performance differences were primarily due to 
differences in the formation or use of declarative or procedural 
knowledge. Elstein and colleagues 1 and Barrows and Tamblyn 2 
among others attempted to describe expert/novice differences 
with comparisons of procedural knowledge. Despite numerous 
efforts, they did not account for expert/novice differences on the 
basis of procedural knowledge. 

Grant and Marsden 3 and Bordage and Zacks 4 presented evi- 
dence supporting the existence of expert/novice declarative dif- 
ferences in knowledge-base content and knowledge-base struc- 
ture. Their studies did not tie expert/novice differences in 
content-related and structure -related declarative knowledge to 
differences in diagnostic accuracy. Norman and colleagues 6 suc- 
cessfully related diagnostic accuracy to a pattern recognition 
process derived from knowledge of multiple past instances or 
examples. 

Medical decision making is a categorization task. To carry out 
this task required for clinical diagnosis, some cognitive scientists 
believe that clinicians use declarative and procedural knowledge 
to form a structured knowledge base. Within this framework, a 
physician’s knowledge base contains many elements, some of 
which are conceptual representations of disease classes. A given 
disease-class concept is internalized as a structured set of 
weighted, disease -related features (signs/symptoms). This struc- 
tured set of disease-related weighted features is often referred to 
as a pattern or prototype. These disease -class concepts are used 
by clinicians to classify a patient’s signs and symptoms as being 
due to a specific disease. It is suggested furthermore that diag- 
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nostic (class categorization) performance is based upon a proto- 
type -to -example comparison. 4 * 6 The physician compares findings 
in the patient with a mental catalogue of disease prototypes. 

Papa and Meyer 7 designed an Al-derived tool to model this 
explanation of medical decision making. This framework has 
been extended to suggest that the physician’s ability to correctly 
diagnose (recognize) cases depends upon two underlying con- 
structs, that is, the degree to which the patient findings match a 
prototype (pattern matching) and the extent to which that pro- 
totype is distinct from alternative prototypes (pattern discrimi- 
nation). Measures of diagnostic accuracy, pattern matching, and 
pattern discrimination derived from this tool have demonstrated 
construct validity. 8 In the present study, the rc'e that the two 
prototype- related constructs play in the development of diag- 
nostic expertise is explored. 

Methods 

A total of 173 subjects at varied levels of clinical experience 
participated in the study (121 third-year and fourth-year medi- 
cal students at the Texas College of Osteopathic Medicine, 18 
emergency medicine residents, and 34 board-certified emergency 
medicine physicians). 

A “problem space” for acute chest pain was created. It con- 
sisted of 67 historical and physical findings commonly asso- 
ciated with the clinical diagnosis of acute chest pain and a list of 
9 common or important diseases known to cause acute chest 
pain. The 9 diseases were myocardial infarction, myocardial an- 
gina, pericarditis, pneumonia, pneumothorax, pulmonary em- 
bolus, dissecting thoracic aortic aneurysm, esophageal -upper 
intestinal disorders, and musculoskeletal disorders. The 67 fea- 
tures have been previously described. 7 These features included 
history findings (e.g., age > 40, male, sudden dyspnea) and phys- 
ical findings (e.g., wheezes, rales, S4 gallop). By predefining the 
differentials and features to be used by all subjects, possible 
differences in the knowledge-base content among subjects were 
eliminated. 

The program required that the subjects declare their knowl- 
edge concerning the relationship between each of the 9 diseases 
and the 67 features. Their knowledge base took the following 
form: “Within the context of Acute Chest Pain, what percentage 
of patients with <disease> have <finding>”? All subsequent 
performance measures were directly related to differences in the 
subjects’ knowledge of disease -by -feature relationships. 

The AI tool was written in structured basic. It used the sub- 
jective disease-by- feature relationship matrix and a non-Baysian 
mathematical model to transform each subject’s knowledge base 
into a set of 9 disease prototypes. These prototypes were used to 
infer a diagnosis upon each of 18 confirmed myocardial infarc- 
tion cases. The performance of each subject’s prototypes was 
recorded. Measures of diagnostic accuracy, pattern matching, 
and pattern discrimination against each of the 18 criteria cases 
were recorded. Diagnostic accuracy was the number of myocar- 
dial infarction cases correctly diagnosed. Pattern matching was 




